6 research outputs found

    A study of the translation of sentiment in user-generated text

    Get PDF
    A thesis submitted in partial ful filment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Emotions are biological states of feeling that humans may verbally express to communicate their negative or positive mood, influence others, or even afflict harm. Although emotions such as anger, happiness, affection, or fear are supposedly universal experiences, the lingual realisation of the emotional experience may vary in subtle ways across different languages. For this reason, preserving the original sentiment of the source text has always been a challenging task that draws in a translator's competence and fi nesse. In the professional translation industry, an incorrect translation of the sentiment-carrying lexicon is considered a critical error as it can be either misleading or in some cases harmful since it misses the fundamental aspect of the source text, i.e. the author's sentiment. Since the advent of Neural Machine Translation (NMT), there has been a tremendous improvement in the quality of automatic translation. This has lead to an extensive use of NMT online tools to translate User-Generated Text (UGT) such as reviews, tweets, and social media posts, where the main message is often the author's positive or negative attitude towards an entity. In such scenarios, the process of translating the user's sentiment is entirely automatic with no human intervention, neither for post-editing nor for accuracy checking. However, NMT output still lacks accuracy in some low-resource languages and sometimes makes critical translation errors that may not only distort the sentiment but at times flips the polarity of the source text to its exact opposite. In this thesis, we tackle the translation of sentiment in UGT by NMT systems from two perspectives: analytical and experimental. First, the analytical approach introduces a list of linguistic features that can lead to a mistranslation of ne-grained emotions between different language pairs in the UGT domain. It also presents an error-typology specifi c to Arabic UGT illustrating the main linguistic phenomena that can cause mistranslation of sentiment polarity when translating Arabic UGT into English by NMT systems. Second, the experimental approach attempts to improve the translation of sentiment by addressing some of the linguistic challenges identifi ed in the analysis as causing mistranslation of sentiment both on the word-level and on the sentence-level. On the word-level, we propose a Transformer NMT model trained on a sentiment-oriented vector space model (VSM) of UGT data that is capable of translating the correct sentiment polarity of challenging contronyms. On the sentence-level, we propose a semi-supervised approach to overcome the problem of translating sentiment expressed by dialectical language in UGT data. We take the translation of dialectical Arabic UGT into English as a case study. Our semi-supervised AR-EN NMT model shows improved performance over the online MT Twitter tool in translating dialectical Arabic UGT not only in terms of translation quality but also in the preservation of the sentiment polarity of the source text. The experimental section also presents an empirical method to quantify the notion of sentiment transfer by an MT system and, more concretely, to modify automatic metrics such that its MT ranking comes closer to a human judgement of a poor or good translation of sentiment

    Towards a better understanding of Tarajem: creating topological networks for Arabic biographical dictionaries

    Get PDF
    Biographical writing is one of the earliest and most extensive forms of Arabic literature. Some scholars tend to assume that classical Arabic biographies, widely known as Tarāǧim, arose in conjunction with the study of the reliability of the Hadith transmitters (the reciters of the Prophet Mohammad's sayings) which lead to a proliferation of biographical material collected and used to assess the transmitter's trustworthiness . However, a scrutiny of the well-known classical Arabic biographical dictionaries such as Siyaru 'A`lāmi an-Nubalā' `The Lives of the Noble Figures' for Adh-Dhahabī shows that they extend their entries to other classes of persons important to the development of particular fields such as Islamic jurisprudents, rulers, poets, philosophers or physicians. The main contribution of Arabic biographical dictionaries is the cumulative value of the thousands of life histories which construct a picture of the Islamic society in different eras. An Arabic biographical dictionary, therefore, is predominantly used by scholars to look up an eminent person's achievements and historical background. In this project, however, we explore Arabic biographies as a prosopography, rather than a biography in the strict sense. We introduce a novel method for a better understanding of Arabic biographical dictionaries by creating a network of relations among different persons. We utilise Natural Language Processing (NLP) tools to create a topological network from the unstructured data of 45,500 biographical entries collected from different dictionaries. We aim to illustrate how network analysis leveraged by NLP tools can provide scholars with innovative methods for discovering complex constellation of relations between prominent and non-prominent figures spanning over several eras and from different fields of knowledge. We also use graph visualisation as a means to effectively communicate and explore such complex constellations. Each network visualisation is purposefully designed to be as simple and robust as possible to offer scholars a way to move relatively fluidly between the large scale of biographical entries and to easily interpret the minute ties between persons of different walks of life. We make both our data and code publicly available for researchers to replicate the experiment. It can be found at:https://github.com/sadanyh/Relational-Network-for-Arabic-Taraje

    RGCL at IDAT: deep learning models for irony detection in Arabic language

    Get PDF
    This article describes the system submitted by the RGCL team to the IDAT 2019 Shared Task: Irony Detection in Arabic Tweets. The system detects irony in Arabic tweets using deep learning. The paper evaluates the performance of several deep learning models, as well as how text cleaning and text pre-processing influence the accuracy of the system. Several runs were submitted. The highest F1 score achieved for one of the submissions was 0.818 making the team RGCL rank 4th out of 10 teams in final results. Overall, we present a system that uses minimal pre-processing but capable of achieving competitive results
    corecore